Skip to content

[FLINK-27773][Web Dashboard] Top N Metrics Dashboard#27774

Open
featzhang wants to merge 1 commit intoapache:masterfrom
featzhang:feature/FLINK-top-n-metrics-dashboard
Open

[FLINK-27773][Web Dashboard] Top N Metrics Dashboard#27774
featzhang wants to merge 1 commit intoapache:masterfrom
featzhang:feature/FLINK-top-n-metrics-dashboard

Conversation

@featzhang
Copy link
Member

Purpose

This PR fixes fundamental architectural issues in the Top N Metrics Dashboard implementation that were identified during CI analysis. The previous implementation had critical design flaws that prevented it from working correctly.

Changes

  1. Fixed REST Handler inheritance - Now properly extends AbstractRestHandler instead of using incorrect base class
  2. Fixed MessageHeaders implementation - Now implements RuntimeMessageHeaders with correct method signatures (getRequestClass, getResponseClass, getResponseStatusCode, getHttpMethod)
  3. Fixed MetricStore access - Using public APIs:
    • metricStore.getRepresentativeAttempts() to get job tasks
    • taskMetricStore.getAllSubtaskMetricStores() to get subtasks
    • Instead of attempting to access private members (JobMetricStore, TaskMetricStore.subtasks)
  4. Fixed HTTP method references - Using HttpMethodWrapper instead of non-existent HttpMethod
  5. Added proper logging - Added Logger instance for better error tracking
  6. Added handler registration - Registered TopNMetricsHandler in WebMonitorEndpoint
  7. Moved response body to correct package - Moved from legacy.messages to proper job.metrics package

Implementation Details

The implementation now follows Flink's standard REST API architecture pattern:

  • Extends AbstractRestHandler<RestfulGateway, EmptyRequestBody, TopNMetricsResponseBody, TopNMetricsMessageParameters>
  • Implements proper request handling with MetricFetcher integration
  • Uses public MetricStore APIs to safely access metrics data
  • Returns Top N metrics for:
    • CPU consumers (Top 5)
    • Backpressured operators (Top 5)
    • GC-intensive tasks (Top 5)

Verifying this change

  • Code compiles successfully (excluding unrelated upstream compilation issues)
  • Follows Flink REST API architecture patterns
  • Uses proper public APIs for MetricStore access
  • Code formatted with Spotless
  • Integration tests (to be added)

Documentation

This adds a new REST endpoint: GET /jobs/:jobid/metrics/top-n that returns Top N metrics for a job.

Notes

The previous PR #27771 was closed due to fundamental architectural issues. This implementation addresses all identified issues and follows Flink's standard patterns.

@featzhang featzhang changed the title [FLINK-27773][Web Dashboard] Fix Top N Metrics Dashboard implementation architecture [FLINK-27773][Web Dashboard] Top N Metrics Dashboard implementation architecture Mar 16, 2026
@featzhang featzhang changed the title [FLINK-27773][Web Dashboard] Top N Metrics Dashboard implementation architecture [FLINK-27773][Web Dashboard] Top N Metrics Dashboard Mar 16, 2026
@flinkbot
Copy link
Collaborator

flinkbot commented Mar 16, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@featzhang featzhang force-pushed the feature/FLINK-top-n-metrics-dashboard branch from 2c037cf to c61453b Compare March 24, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants